Script-description Pair Extraction from Text Documents of English as Second Language Podcast
نویسندگان
چکیده
One of the best effective way to learn a language is having a conversation with a native speaker. However it is often very expensive way. A good alternative way is using Dialog-Based Computer Assisted Language Learning (DB-CALL) systems. The feedback quality in DB-CALL systems is very important. Therefore, to provide various expressions as feedback information, we propose a method which extracts script and their description sentence pairs from English as a Second Language (ESL) podcast web site. A linear CRFs classifier is used to find the corresponding description sentences and several features are selected according to the characteristics of the ESL text documents. The experimental results show that the performance of our system is acceptable.
منابع مشابه
Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée. (Extraction a parallel corpus for machine translation from and to under-resourced languages)
Nowadays, machine translation has reached good results when applied to several language pairs such as English – French, English – Chinese, English – Spanish, etc. Empirical translation, particularly statistical machine translation allows us to build quickly a translation system if adequate data is available because statistical machine translation is based on models trained from large parallel b...
متن کاملThe Impact of Lemmatization in Word Alignment
The focus of this thesis is on examining whether word alignment results can be improved in precision and recall through lemmatization, and extraction of lemma dictionaries from the resulting links. Lemmas are extracted from existing lexical resources in order to replace word forms in two parallel corpora documents, one featuring the language pair English-Swedish and the other the language pair ...
متن کاملUnsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. To address this problem, we pr...
متن کاملبازخوانی اسناد کتیبهای غیرمنقول در میراث جهانی مجموعه بازار تاریخی تبریز
Immovable inscriptions are considered as one of the most important works and among the historical documents in cultural assets of our dear country, which were installed on selected parts of historical buildings and outstanding monuments and were always noticeable. The role of inscriptions as the basic and effective tools is important in terms of manifesting and implication of educational and ed...
متن کاملDLOLIS-A: Description Logic based Text Ontology Learning
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been...
متن کامل